26 research outputs found

    Compression of Structured High-Throughput Sequencing Data

    Get PDF
    Large biological datasets are being produced at a rapid pace and create substantial storage challenges, particularly in the domain of high-throughput sequencing (HTS). Most approaches currently used to store HTS data are either unable to quickly adapt to the requirements of new sequencing or analysis methods (because they do not support schema evolution), or fail to provide state of the art compression of the datasets. We have devised new approaches to store HTS data that support seamless data schema evolution and compress datasets substantially better than existing approaches. Building on these new approaches, we discuss and demonstrate how a multi-tier data organization can dramatically reduce the storage, computational and network burden of collecting, analyzing, and archiving large sequencing datasets. For instance, we show that spliced RNA-Seq alignments can be stored in less than 4% the size of a BAM file with perfect data fidelity. Compared to the previous compression state of the art, these methods reduce dataset size more than 40% when storing exome, gene expression or DNA methylation datasets. The approaches have been integrated in a comprehensive suite of software tools (http://goby.campagnelab.org) that support common analyses for a range of high-throughput sequencing assays.National Center for Research Resources (U.S.) (Grant UL1 RR024996)Leukemia & Lymphoma Society of America (Translational Research Program Grant LLS 6304-11)National Institute of Mental Health (U.S.) (R01 MH086883

    Global Array-Based Transcriptomics from Minimal Input RNA Utilising an Optimal RNA Isolation Process Combined with SPIA cDNA Probes

    Get PDF
    Technical advances in the collection of clinical material, such as laser capture microdissection and cell sorting, provide the advantage of yielding more refined and homogenous populations of cells. However, these attractive advantages are counter balanced by the significant difficultly in obtaining adequate nucleic acid yields to allow transcriptomic analyses. Established technologies are available to carry out global transcriptomics using nanograms of input RNA, however, many clinical samples of low cell content would be expected to yield RNA within the picogram range. To fully exploit these clinical samples the challenge of isolating adequate RNA yield directly and generating sufficient microarray probes for global transcriptional profiling from this low level RNA input has been addressed in the current report. We have established an optimised RNA isolation workflow specifically designed to yield maximal RNA from minimal cell numbers. This procedure obtained RNA yield sufficient for carrying out global transcriptional profiling from vascular endothelial cell biopsies, clinical material not previously amenable to global transcriptomic approaches. In addition, by assessing the performance of two linear isothermal probe generation methods at decreasing input levels of good quality RNA we demonstrated robust detection of a class of low abundance transcripts (GPCRs) at input levels within the picogram range, a lower level of RNA input (50 pg) than previously reported for global transcriptional profiling and report the ability to interrogate the transcriptome from only 10 pg of input RNA. By exploiting an optimal RNA isolation workflow specifically for samples of low cell content, and linear isothermal RNA amplification methods for low level RNA input we were able to perform global transcriptomics on valuable and potentially informative clinically derived vascular endothelial biopsies here for the first time. These workflows provide the ability to robustly exploit ever more common clinical samples yielding extremely low cell numbers and RNA yields for global transcriptomics

    TET family dioxygenases and DNA demethylation in stem cells and cancers

    Get PDF
    The methylation of cytosine and subsequent oxidation constitutes a fundamental epigenetic modification in mammalian genomes, and its abnormalities are intimately coupled to various pathogenic processes including cancer development. Enzymes of the Ten-eleven translocation (TET) family catalyze the stepwise oxidation of 5-methylcytosine in DNA to 5-hydroxymethylcytosine and further oxidation products. These oxidized 5-methylcytosine derivatives represent intermediates in the reversal of cytosine methylation, and also serve as stable epigenetic modifications that exert distinctive regulatory roles. It is becoming increasingly obvious that TET proteins and their catalytic products are key regulators of embryonic development, stem cell functions and lineage specification. Over the past several years, the function of TET proteins as a barrier between normal and malignant states has been extensively investigated. Dysregulation of TET protein expression or function is commonly observed in a wide range of cancers. Notably, TET loss-of-function is causally related to the onset and progression of hematologic malignancy in vivo. In this review, we focus on recent advances in the mechanistic understanding of DNA methylation-demethylation dynamics, and their potential regulatory functions in cellular differentiation and oncogenic transformation
    corecore